The Integration of a Part - of - Speech Taggerinto
نویسندگان
چکیده
We describe how part-of-speech information delivered by a tagger (the mpro tool) has been integrated into the alep (Advanced Language Engineering Platform) system. For this we extended an approach described within the ls-gram project, which consisted in deening the Text Handling component of alep in such a way that so-called \messy details" are handled within this subsystem, hence keeping the (linguistic) parser free from such tasks. We just extended the tagging strategy used for this purpose to normal words and modiied the default tagging of words proposed by the alep system in order to incorporate informationdelivered by the part-of-speechtagger. The resulting tagging is converted by means of \lift" rules into partial linguistic descriptions, which provide the direct input to the grammatical analysis. We show that this procedure substantially reduces the parse times of the system. 1 Background The starting point of our work has been the treatment of so-called`messy details' within the alep platform 2. Messy details are text constructs which do not lend themselves well to treatment by traditional techniques for linguistic analysis, whence their`messiness'. Typical examples are numbers, codes or other (sequences of) word-forms which can occur in many variations, making impossible a comprehensive treatment by traditional means. Word level phenomena are usually the most frequently occurring messy details, including such things as dates, numbers and proper names. For any realistic NLP application these types of constructs must be processed eeciently, the alternative being coding them individually in some lexicon and/or implementing sets of grammar rules for parsing them syntactically. This problem area was given priority in the ls-gram project 3 which aimed to integrate an approach to messy details into a large-scale grammar implementation. How we dealt with this phenomena has been described in 2 short papers 4. The alep platform provides the user with a Text Handling (TH) component which allows a \prepro-cessing" of inputs. Input texts will rst go through a processing chain consisting in a sgml-based tagging of their elements. The default setup of the system deenes the following processing chain: the text is rst converted to an EDIF (Eurotra Document Interchange Format) format. Then a subsequent recognition process is provided: paragraph recognition, sentence recognition and word recognition. The 1 We would like to thank the anonymous reviewers for their precious comments on our contributions for this workshop.
منابع مشابه
Design and Implementation of an Intelligent Part of Speech Generator
The aim of this paper is to report on an attempt to design and implement an intelligent system capable of generating the correct part of speech for a given sentence while the sentence is totally new to the system and not stored in any database available to the system. It follows the same steps a normal individual does to provide the correct parts of speech using a natural language processor. It...
متن کاملIncreasing the Effectiveness of Russian Language Teaching for Special Purposes (to the Problem of Integration of Language Training with Information Technology Courses)
The article is devoted to the problem of increasing the efficiency of language teaching for the special purposes of foreign students in studying Russian at a technical university. Particular attention is paid to the training of foreign students in the skills of working with information using the latest computer technology. The conclusions of the work are based on the analysis of the results of ...
متن کاملمراحل و نحوه ی تهیه ی دادگان های صوتی هجایی و دایفونی برای سامانه ی تبدیل متن به گفتار فارسی
Abstract Speech databases are part of the concatenative text to speech synthesis systems. Phonetic quality of the databases plays a significant role in the naturalness of the synthesized speech. This paper introduces two syllable and diphone speech databases for Persian and investigates the way of their development and their specifications and their advantages to each other. ...
متن کاملThe Quranic Principle of Good Speech as Viewed by Commentators and Its Part in Managing Social Conflicts
Social conflict is any form of conscious hostile behavior, thinking and feeling that occurs at different levels of social relationships. This social problem has numerous and sometimes irreparable personal and social consequences. The purpose of the present study is to examine the position and role of the Quranic principle of "good speech" in the management of social conflicts. In this descripti...
متن کاملDiscourse Structures of Condolence Speech Act
Condolence is part of Austin’s expressive speech act and is related to Searle’s behabitives illocutionary act. Although a theoretically sound issue in pragmatics, condolence speech act has not been investigated as much as other speech acts in discourse-related studies. This paper aims at investigating interjections and intensifiers while performing condolence speech act among Persian and Englis...
متن کاملA Study of the Features and Functions of speech Perseverance (With an Emphasis on the Alavi Teachings)
The serious challenge that contemporary human is encountered with has been brought about by the lack of applying ethical and behavioral necessities in his life rather than by the weakness of the rules or lack of technology. One of the mentioned important necessities is the factor of speech perseverance which has a particular conceptual and meaningful weight that is the adducing of the right spe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997